sparkSession.sharedState.cacheManager
CacheManager — In-Memory Cache for Cached Tables
CacheManager is an in-memory cache for cached tables (as logical plans). It uses the internal cachedData collection of CachedData to track logical plans and their cached InMemoryRelation representation.
CacheManager is shared across SparkSessions though SharedState.
cachedData Internal Registry
cachedData is a collection of CachedData with logical plans and their cached InMemoryRelation representation.
A new CachedData is added when a Dataset is cached and removed when a Dataset is uncached or when invalidating cache data with a resource path.
invalidateCachedPath Method
|
Caution
|
FIXME |
invalidateCache Method
|
Caution
|
FIXME |
lookupCachedData Method
|
Caution
|
FIXME |
uncacheQuery Method
|
Caution
|
FIXME |
isEmpty Method
|
Caution
|
FIXME |
Caching Dataset — cacheQuery Method
cacheQuery(
query: Dataset[_],
tableName: Option[String] = None,
storageLevel: StorageLevel = MEMORY_AND_DISK): Unit
cacheQuery obtains analyzed logical plan and saves it as a InMemoryRelation in the internal cachedData cached queries collection.
If however the query has already been cached, you should instead see the following WARN message in the logs:
WARN CacheManager: Asked to cache already cached data.
Removing All Cached Tables From In-Memory Cache — clearCache Method
clearCache(): Unit
clearCache acquires a write lock and unpersists RDD[CachedBatch]s of the queries in cachedData before removing them altogether.
|
Note
|
clearCache is executed when the CatalogImpl is requested to clearCache.
|
CachedData
|
Caution
|
FIXME |